Background

The development of single-cell RNA-seq (scRNA-seq) allows the identification of cell populations and transcriptionally characterization of the identified cells. However, the ways that cells were prepared for scRNA-seq lead to loss of spatial information. As the activities of individual cells are tightly regulated by the surrounding environment, incorporation of spatial information with the transcriptome is vital for the interpretation of their function. Taking the tumor microenvironment as an example, the functions of T cells are largely influenced by the neighboring myeloid cells, regulatory T cells, stromal cells and tumor cells, which regulate T cells through multiple mechanisms such as ligand-receptor interaction and tuning metabolic environment such as nutrients supply and hypoxia status.

Spatial Transcriptomics (ST) methods aim to resolve the gene transcriptomic profiles in the spatial tissue context. They are primarily divided into two categories: in situ hybridization (ISH)-based methods, also known as imaging-based methods, and sequencing-based methods. ISH-based methods include seqFISH, seqFISH+ and MERFISH, which rely on specially designed probe sets hybridized with transcripts to return signals for visualizing the transcriptional profiles. Sequencing-based methods include 10x Visium, SlideSeq, HDST and etc. These methods are based on scRNA-seq, which is able to sequence the whole genome without the requirement of probe design.

The major technical obstacle of current sequencing-based ST technologies is either low resolution or low coverage. For 10x Visium technology, the fresh frozen tissue is laid on the ST slides that consist of a matrix of spots. Each spot is 55um in diameter, which results in multiple cells ‘falling’ into a single spot. The subsequent step is sequencing a mixture of cells rather than isolated single cells, which bring ups the necessity of deconvolution. A sophisticated approach to deconvolve the cell composition in the spots is integration with scRNA-seq dataset from either reference or paralleled experiments. The integration approach benefits from two sides: 1) the scRNA-seq dataset helps annotate the cell population from the spots; 2) integration of ST and scRNA-seq data also gives the scRNA-seq-identified cell populations pseudolocation for better interpretation of the single cell results.

The following pipeline is to analyze ST dataset generated by 10x Visium technology, from normalization to integration with scRNA-seq dataset. It is based on the vignette from Satija Lab (https://satijalab.org/seurat/articles/spatial_vignette.html).

Data format of spatial transcriptomics

Pipeline Overview

Step 1. Normalization of sequencing data

Step 2. Dimensional reduction and clustering

Step 3. Spatial-DE analysis

Step 4. Integration with single-cell RNA-seq data

library(Seurat)
library(SeuratData)
library(ggplot2)
library(patchwork)
library(dplyr)

Here we use data generated by 10x Visium on mouse brain

InstallData("stxBrain")

Load data into Seurat object.

brain <- LoadData("stxBrain", type= "anterior1")
# or data can be loaded directly using Load10X_Spatial function
# e.g. 
# object <- Load10X_Spatial(data.dir, filename = "filtered_feature_bc_matrix.h5")

1. Normalization

Before starting data processing, the following figure shows that spots from different sites showed unevenly distributed molecular counts. Various molecular counts were detected in spots from different region so the difference of gene counts now does not reflect the true difference of gene transcriptional difference in different regions, which raises the necessity of normalization.

SpatialFeaturePlot overlays data on top of the tissue H&E image.

plot1 <- VlnPlot(brain, features="nCount_Spatial", pt.size=0.1)+NoLegend()
plot2 <- SpatialFeaturePlot(brain, features="nCount_Spatial")+theme(legend.position = "right")
wrap_plots(plot1, plot2)

Normalization is to remove the influence brought by technical factors:

Normalized gene expression level and variance of which across cells are independent of gene abundance or sequencing depth but reflect true biological heterogeneity.

SCTransform employs regularized negative binomial models of gene expression, which removes technical influence and spatial bias but retains biological variance, surpassing log-normalization that forces all data points to have same ‘underlying’ library sizes after normalization.

Normalized data is stored in assay “SCT”.

(Hafemeister and Satija, 2019)

brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
plot3 <- VlnPlot(brain, features="nCount_SCT", pt.size=0.1)+NoLegend()
plot4 <- SpatialFeaturePlot(brain, features="nCount_SCT")+theme(legend.position = "right")
wrap_plots(plot3, plot4)

Use SpatialFeatureplot to visualize the spatial expression of individual genes after normalization. Hpca is a marker of hippocampus and Ttr is marker of choroid plexus.

SpatialFeaturePlot(brain, features = c("Hpca","Ttr"))

Adjust the size and opacity of spots to make the spatial expression of selected gene more clear.

p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2

2. Dimensionality reduction and clustering

brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
ElbowPlot(brain, ndims=50)

brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30) # choose the top 30 most variably feature sets for the ease of computation as well as noise reduction
brain <- FindClusters(brain, verbose = FALSE)

brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)

p1 <- DimPlot(brain, reduction="umap", label=T) #umap
p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3) #overlay clusters to images

p1 + p2

Highlight the individual cluster alone on the image.

SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(2, 1, 4, 3, 5, 14)), facet.highlight = TRUE, ncol=3)

3. Identify spatially DE genes

3.1 DEG with known spatial anatomy annotation

Find differentially expressed genes that distinguish cluster 1 from cluster 14 and visualized the top 3 DE genes on tissue image. FindMarkers can be directly used here to identify DE genes since the clusters show clear spatial restrictions and well related to the known anatomy of brain.

de_markers <- FindMarkers(brain, ident.1 = 1, ident.2 = 14)
SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)

3.2 find spatially variable genes without pre-annotation

FindSpatiallyVariableFeatures searches for genes whose expression level is dependent on their spatial location when the annotation is absent (without clustering).
Other methods such as SpatialDE and Splotch can be used for same purpose.

brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000], selection.method = "markvariogram")
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)
SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))

4. Subsampling

This step is to subset the data to focus on specific regions of interest for further analysis such as integration with a scRNA-seq dataset. In this example, cluster 1, 2, 3, 4, 6, 7 roughly belongs to frontal cortex based on known brain anatomy. Data for these clusters are retrieved and stored into the new seurat object ‘cortex’.

cortex <- subset(brain, idents = c(1, 2, 3, 4, 6, 7))
SpatialDimPlot(cortex)

Visualize and remove the spots taht fall outside of the interested region.

SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = anterior1_imagerow > 400 | anterior1_imagecol <150))

Remove unwanted spots.

cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE)
cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)

p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE)
p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3)
p1 + p2

5. Integration with scRNA-seq Data

In this pipeline, Satija Lab took an ‘anchor’-based integration approach to ‘deconvolute’ the spots and transfer annotations from scRNAseq dataset to ST dataset(Stuart and Butler et al., 2019).

Other Integration/Deconvolution approaches such as Harmony (Korsunsky et al., 2019), LIGER (Welch et al., 2019)…

The scRNA-seq dataset used here is generated from ~14000 adult mouse cortical cell taxonomy from Allen Institute, which will be used for annotating the cortex subset created above.

allen_reference <- readRDS("/fs/project/PAS1475/YiWang/Yi1/Test/allen_cortex.rds")

Normalize scRNA-seq data with SCTransform and perform scRNA-seq workflow.

allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:30)
DimPlot(allen_reference, group.by = "subclass", label = TRUE)

Renormalize the cortex subset.

cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>% RunPCA(verbose = FALSE)
# the annotation is stored in the 'subclass' column of object metadata

FindTransferAnchors is used to find a set of anchors by first reducing the dimensionality of reference dataset and query dataset, then apply L2-normalization to the canonical correlation vectors, followed by searching for mutual nearest neighbours (MNNs) in this shared low-dimensional representation. The resulting cell pairs (cell-spot pair in this case?) are designated as anchors, as they encode the cellular relationships across datasets that will form the basis for all subsequent integration analyses.
(Stuart and Butler et al., 2019)

The anchors are used as the input for TransferData, which annotates the query dataset based on the reference cell identities.

anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE, 
    weight.reduction = cortex[["pca"]], dims = 1:30)
cortex[["predictions"]] <- predictions.assay

The cortex is subdevided into 6 layers designated L1-L6. L4 receives sensory input and L2/3 is the first processing place. We can visualize the predicted spatial location of cells forming these layers.

DefaultAssay(cortex) <- "predictions"
SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)

Use FindSpatiallyVariableFeatures to identify cell populations that mostly restricted spatially based on prediction score.

cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram", 
    features = rownames(cortex), r.metric = 5, slot = "data")
top.clusters <- head(SpatiallyVariableFeatures(cortex), 4)
SpatialPlot(object = cortex, features = top.clusters, ncol = 2)

Visualize the location of cell populations on tissue image to confirm that the predicted cell population is recapturing the real brain structure.

SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT", 
    "L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))

6. Working with multiple slides

Load the data of another slide and perform normalization.

brain2 <- LoadData("stxBrain", type = "posterior1")
brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)

Merge two objects.

brain.merge <- merge(brain, brain2)

Perform dimensional reduction and clustering on the two datasets.

DefaultAssay(brain.merge) <- "SCT"
VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2))
brain.merge <- RunPCA(brain.merge, verbose = FALSE)
brain.merge <- FindNeighbors(brain.merge, dims = 1:30)
brain.merge <- FindClusters(brain.merge, verbose = FALSE)
brain.merge <- RunUMAP(brain.merge, dims = 1:30)

UMAP

DimPlot(brain.merge, reduction = "umap", group.by = c("ident", "orig.ident"))

Spatial visualization

SpatialDimPlot(brain.merge)

Spatial epxression of selected genes on two slides.

SpatialFeaturePlot(brain.merge, features = c("Hpca", "Plp1"))